Skip to content

Conversation

MannDP
Copy link
Collaborator

@MannDP MannDP commented Sep 9, 2025

What changes are proposed in this pull request?

This PR is (2/2) to support writing domain metadata. It adds support for removing metadata for user-specified domains. As per the Delta specification, for a removed domain metadata "writers should preserve an accurate pre-image of the configuration". Thus for any removals, we need to perform a log replay and restore the original configuration of the domain with the removed flag set to true. Furthermore, for any removal where the domain does not already exist in the log, we treat this as a no-op and do not write any record to the Delta Log.

How was this change tested?

Added new integration tests in write.rs.

Copy link

codecov bot commented Sep 9, 2025

Codecov Report

❌ Patch coverage is 98.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.89%. Comparing base (05e6698) to head (535d242).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/transaction/mod.rs 97.22% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1275      +/-   ##
==========================================
+ Coverage   84.55%   84.89%   +0.33%     
==========================================
  Files         113      113              
  Lines       28009    28830     +821     
  Branches    28009    28830     +821     
==========================================
+ Hits        23683    24475     +792     
- Misses       3189     3197       +8     
- Partials     1137     1158      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch from fd368f5 to dece4cd Compare September 11, 2025 00:23
@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch 2 times, most recently from a568124 to 2c50fd0 Compare September 11, 2025 19:03
@MannDP MannDP changed the title <WIP> feat: support writing domain metadata (2/2) feat: support writing domain metadata (2/2) Sep 11, 2025
@github-actions github-actions bot added the breaking-change Change that require a major version bump label Sep 11, 2025
@MannDP MannDP marked this pull request as ready for review September 11, 2025 19:05
@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch from 2c50fd0 to f7319a6 Compare September 17, 2025 16:13
@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch from f7319a6 to 05ae907 Compare September 17, 2025 16:18
MannDP added a commit that referenced this pull request Sep 24, 2025
## What changes are proposed in this pull request?
<!--
Please clarify what changes you are proposing and why the changes are
needed.
The purpose of this section is to outline the changes, why they are
needed, and how this PR fixes the issue.
If the reason for the change is already explained clearly in an issue,
then it does not need to be restated here.
1. If you propose a new API or feature, clarify the use case for a new
API or feature.
  2. If you fix a bug, you can clarify why it is a bug.
-->
This PR is (1/2) to support writing domain metadata.
1. (this PR) support adding or updating a domain metadata configuration
2. (follow up) support _removing_ a domain metadata,
#1275.

- resolves #1270

Writing domain metadata, as per the [Delta
protocol](https://arc.net/l/quote/fwjwtumy) requires:
- The table must be on writer version 7, and a feature named
`domainMetadata` must exist in the table's `writerFeatures`. We satisfy
this via an explicit check in `Transaction::commit`.
- Two overlapping transactions conflict if they both contain a domain
metadata action for the same metadata domain. We satisfy this with
Kernel's coarse grained conflict detection based on the fact that the
`delta_log` dir already has a log file with the name this txn was going
to write. No new logic needed in this PR.

## How was this change tested?
<!--
Please make sure to add test cases that check the changes thoroughly
including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please
clarify how you tested, ideally via a reproducible test documented in
the PR description.
-->
Added new integration tests in `write.rs`.
@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch 2 times, most recently from f4bfcd5 to ef7ca8a Compare September 24, 2025 20:35
@MannDP MannDP force-pushed the manndp/remove_domain_metadata branch from ef7ca8a to 7897691 Compare September 24, 2025 20:37
pub(crate) fn remove(domain: String) -> Self {
Self {
domain,
configuration: String::new(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this how existing implementations do removes? I kinda figured it would retain the old configuration but haven't dug into this yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great call out! i went back and looked at the protocol again. "Writers should preserve an accurate pre-image of the configuration" -> so we should keep it around actually.

Copy link
Collaborator

@lbhm lbhm Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retaining the old configuration would require log replay though, right? We could accept configuration: Option<String> as an additional parameter to let the caller provide the old value if already known, but TBH I'm not sure if that improves the current solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that it requires log replay. But as long as we do that once it should be okay. I checked the Java version of the delta kernel for inspiration and they follow this pattern where remove takes only the domain and they internally determine the pre-image configuration value to preserve. Code ptr: https://github.com/delta-io/delta/blob/master/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TransactionImpl.java#L230C15-L230C35.

The other thing is if we let users pass it in, we likely should validate that the pre-image they provide is correct. Which would require log replay anyways.

Copy link
Collaborator

@lbhm lbhm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I left a few nits and one suggestion for improving test coverage.

pub(crate) fn remove(domain: String) -> Self {
Self {
domain,
configuration: String::new(),
Copy link
Collaborator

@lbhm lbhm Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retaining the old configuration would require log replay though, right? We could accept configuration: Option<String> as an additional parameter to let the caller provide the old value if already known, but TBH I'm not sure if that improves the current solution.

}

/// Remove domain metadata from the Delta log.
/// This creates a tombstone to logically delete the specified domain. We don't check
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IMO it is good style to have a paragraph break (i.e., empty line) after the one-line summary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'm not following what you mean. Do you mean the "We don't check" should start on the next line perhaps?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant

/// Remove domain metadata from the Delta log.
///
/// This creates a tombstone to logically delete the specified domain. We don't check

See this clippy lint for reference.

}

// Create a new DomainMetadata action to remove a domain.
pub(crate) fn remove(domain: String) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like remove is a somewhat misleading name for a method that is actually a constructor. I don't have a strong opinion here, but maybe something like new_tombstone communicates the function's intent more clearly?

Copy link
Collaborator

@OussamaSaoudi OussamaSaoudi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests! Just some code reshuffling then we're good to go 👍

Comment on lines +348 to +362
let user_domains = self
.domain_metadatas
.iter()
.filter_map(move |dm: &DomainMetadata| {
if dm.is_removed() {
existing_domains.get(dm.domain()).map(|existing| {
DomainMetadata::remove(
dm.domain().to_string(),
existing.configuration().to_string(),
)
})
} else {
Some(dm.clone())
}
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consume the domain_metadatas and avoid the clone since it seems like we don't use it anymore

Suggested change
let user_domains = self
.domain_metadatas
.iter()
.filter_map(move |dm: &DomainMetadata| {
if dm.is_removed() {
existing_domains.get(dm.domain()).map(|existing| {
DomainMetadata::remove(
dm.domain().to_string(),
existing.configuration().to_string(),
)
})
} else {
Some(dm.clone())
}
});
let user_domains = self
.domain_metadatas
.into_iter()
.filter_map(move |dm: DomainMetadata| {
if !dm.is_removed() {
return Some(dm);
}
existing_domains.get(dm.domain()).map(|existing| {
DomainMetadata::remove(
dm.domain().to_string(),
existing.configuration().to_string(),
)
})
});

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to use removed_domains this becomes:
We can consume the domain_metadatas and avoid the clone since it seems like we don't use it anymore

Suggested change
let user_domains = self
.domain_metadatas
.iter()
.filter_map(move |dm: &DomainMetadata| {
if dm.is_removed() {
existing_domains.get(dm.domain()).map(|existing| {
DomainMetadata::remove(
dm.domain().to_string(),
existing.configuration().to_string(),
)
})
} else {
Some(dm.clone())
}
});
let user_domains = self
.removed_domains
.iter()
.map(move |domain_name: &str| {
existing_domains.get(domain_name).map(|existing| {
DomainMetadata::remove(
dm.domain().to_string(),
existing.configuration().to_string(),
)
})
})
.chain(domain_metadatas);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will probably apply suggestion 1 or 2 depending on our takeaways from the other threads :)

Comment on lines +341 to +346
// fetch previous configuration values (requires log replay)
let existing_domains = if has_removals {
scan_domain_metadatas(self.read_snapshot.log_segment(), None, engine)?
} else {
HashMap::new()
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we need to verify that the domain metadata exists. After all, Delta never checks that a removed file is actually present. Is this something that we need to do? If so, can you put a comment explaining why we need to read the entire log and set the removed configuration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment (that the Delta spec requires it). let's bottom out on this in one of the above threads.

Comment on lines +290 to +292
// actual configuration value determined during commit
self.domain_metadatas
.push(DomainMetadata::remove(domain, String::new()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having the dummy data and checking below whether we have a removal, why not maintain a separate removed_domains: HashSet<String> that we fill in after the log replay?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: only do this if we actually do need to preserve the removed domain metadata's configuration. See my comment here: https://github.com/delta-io/delta-kernel-rs/pull/1275/files#r2400384116

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +322 to 334
if dm.is_internal() {
return Err(Error::Generic(
"Cannot modify domains that start with 'delta.' as those are system controlled"
.to_string(),
));
}
if !domains.insert(domain_metadata.domain()) {

if !seen_domains.insert(dm.domain()) {
return Err(Error::Generic(format!(
"Metadata for domain {} already specified in this transaction",
domain_metadata.domain()
dm.domain()
)));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These feel like checks that should be done in with_domain_metadata:

  • if dm.is_internal()
  • if !seen_domains.insert(dm.domain())

Generally I prefer to fail early. Feel free to make this a followup issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree with the fail fast principle, but this case warrants an exception imo.

We're treating these methods as builder methods (return Self, even though there is no explicit build call) so we should present a fluent builder API. If we do the checking of the error conditions at this point, then we need to return DeltaResult<Self>, which is an awkward interface for both Rust (chaining requires try operator), but particularly for engines across the FFI boundary.

wdyt?

Comment on lines +336 to +338
if dm.is_removed() {
has_removals = true;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linearly checking domain metadatas can be avoided by maintaining a separate hashmap. See here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's bottom out on deferred error checking or not above. Because if we choose to defer error checking, then we cannot make it a HashMap. If we decide to not defer, then will refactor accordingly.

Copy link
Collaborator Author

@MannDP MannDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tysm for the review! wanted to align on 2 high-level things so added thoughts on.

  1. preserve pre-image of removed configuration
  2. error checking, fail-fast or in commit

Comment on lines +290 to +292
// actual configuration value determined during commit
self.domain_metadatas
.push(DomainMetadata::remove(domain, String::new()));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +322 to 334
if dm.is_internal() {
return Err(Error::Generic(
"Cannot modify domains that start with 'delta.' as those are system controlled"
.to_string(),
));
}
if !domains.insert(domain_metadata.domain()) {

if !seen_domains.insert(dm.domain()) {
return Err(Error::Generic(format!(
"Metadata for domain {} already specified in this transaction",
domain_metadata.domain()
dm.domain()
)));
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally agree with the fail fast principle, but this case warrants an exception imo.

We're treating these methods as builder methods (return Self, even though there is no explicit build call) so we should present a fluent builder API. If we do the checking of the error conditions at this point, then we need to return DeltaResult<Self>, which is an awkward interface for both Rust (chaining requires try operator), but particularly for engines across the FFI boundary.

wdyt?

Comment on lines +336 to +338
if dm.is_removed() {
has_removals = true;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's bottom out on deferred error checking or not above. Because if we choose to defer error checking, then we cannot make it a HashMap. If we decide to not defer, then will refactor accordingly.

Comment on lines +341 to +346
// fetch previous configuration values (requires log replay)
let existing_domains = if has_removals {
scan_domain_metadatas(self.read_snapshot.log_segment(), None, engine)?
} else {
HashMap::new()
};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment (that the Delta spec requires it). let's bottom out on this in one of the above threads.

@MannDP MannDP requested a review from OussamaSaoudi October 3, 2025 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Change that require a major version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support domain metadata writes
4 participants